Skip to main content

Data Extraction

Extract structured data from pages using CSS or XPath selectors.

Basic extraction

result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"title": "css:h1",
"first_quote": "css:.text",
},
)
print(result.extracted_data["title"])
print(result.extracted_data["first_quote"])

Multiple values

Use multiple: True to extract all matching elements as a list:

result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"quotes": {"selector": "css:.text", "multiple": True},
"authors": {"selector": "css:.author", "multiple": True},
},
)
for quote, author in zip(result.extracted_data["quotes"], result.extracted_data["authors"]):
print(f"{quote}{author}")

Extract attributes

Extract element attributes like href, src, data-*:

result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"links": {"selector": "css:a", "attribute": "href", "multiple": True},
"images": {"selector": "css:img", "attribute": "src", "multiple": True},
},
)

XPath selectors

result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"quotes": "xpath://span[@class='text']",
"authors": "xpath://small[@class='author']",
},
)

Extraction + browser

Works with browser rendering for JS-generated content:

result = client.scrape(
"https://spa-app.com/products",
browser=True,
extract={
"names": {"selector": "css:.product-name", "multiple": True},
"prices": {"selector": "css:.price", "multiple": True},
},
)

When to use extract vs EvaluateAction

Use caseTool
Data is visible in the HTML/DOMextract (CSS/XPath selectors)
Data comes from JS variablesEvaluateAction
Data comes from internal APIsEvaluateAction
Complex DOM logic neededEvaluateAction